Goto

Collaborating Authors

 random experiment


ChatGPT-4 in the Turing Test: A Critical Analysis

Giunti, Marco

arXiv.org Artificial Intelligence

This paper critically examines the recent publication "ChatGPT-4 in the Turing Test" by Restrepo Echavarr\'ia (2025), challenging its central claims regarding the absence of minimally serious test implementations and the conclusion that ChatGPT-4 fails the Turing Test. The analysis reveals that the criticisms based on rigid criteria and limited experimental data are not fully justified. More importantly, the paper makes several constructive contributions that enrich our understanding of Turing Test implementations. It demonstrates that two distinct formats--the three-player and two-player tests--are both valid, each with unique methodological implications. The work distinguishes between absolute criteria (reflecting an optimal 50% identification rate in a three-player format) and relative criteria (which measure how closely a machine's performance approximates that of a human), offering a more nuanced evaluation framework. Furthermore, the paper clarifies the probabilistic underpinnings of both test types by modeling them as Bernoulli experiments--correlated in the three-player version and uncorrelated in the two-player version. This formalization allows for a rigorous separation between the theoretical criteria for passing the test, defined in probabilistic terms, and the experimental data that require robust statistical methods for proper interpretation. In doing so, the paper not only refutes key aspects of the criticized study but also lays a solid foundation for future research on objective measures of how closely an AI's behavior aligns with, or deviates from, that of a human being.


Low-rank Bayesian matrix completion via geodesic Hamiltonian Monte Carlo on Stiefel manifolds

Cui, Tiangang, Gorodetsky, Alex

arXiv.org Machine Learning

We present a new sampling-based approach for enabling efficient computation of low-rank Bayesian matrix completion and quantifying the associated uncertainty. Firstly, we design a new prior model based on the singular-value-decomposition (SVD) parametrization of low-rank matrices. Our prior is analogous to the seminal nuclear-norm regularization used in non-Bayesian setting and enforces orthogonality in the factor matrices by constraining them to Stiefel manifolds. Then, we design a geodesic Hamiltonian Monte Carlo (-within-Gibbs) algorithm for generating posterior samples of the SVD factor matrices. We demonstrate that our approach resolves the sampling difficulties encountered by standard Gibbs samplers for the common two-matrix factorization used in matrix completion. More importantly, the geodesic Hamiltonian sampler allows for sampling in cases with more general likelihoods than the typical Gaussian likelihood and Gaussian prior assumptions adopted in most of the existing Bayesian matrix completion literature. We demonstrate an applications of our approach to fit the categorical data of a mice protein dataset and the MovieLens recommendation problem. Numerical examples demonstrate superior sampling performance, including better mixing and faster convergence to a stationary distribution. Moreover, they demonstrate improved accuracy on the two real-world benchmark problems we considered.


Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop

Scheidt, Fabian, Machkour, Jasin, Muma, Michael

arXiv.org Machine Learning

Currently, there is an urgent demand for scalable multivariate and high-dimensional false discovery rate (FDR)-controlling variable selection methods to ensure the repro-ducibility of discoveries. However, among existing methods, only the recently proposed Terminating-Random Experiments (T-Rex) selector scales to problems with millions of variables, as encountered in, e.g., genomics research. The T-Rex selector is a new learning framework based on early terminated random experiments with computer-generated dummy variables. In this work, we propose the Big T-Rex, a new implementation of T-Rex that drastically reduces its Random Access Memory (RAM) consumption to enable solving FDR-controlled sparse regression problems with millions of variables on a laptop. We incorporate advanced memory-mapping techniques to work with matrices that reside on solid-state drive and two new dummy generation strategies based on permutations of a reference matrix. Our nu-merical experiments demonstrate a drastic reduction in memory demand and computation time. We showcase that the Big T-Rex can efficiently solve FDR-controlled Lasso-type problems with five million variables on a laptop in thirty minutes. Our work empowers researchers without access to high-performance clusters to make reproducible discoveries in large-scale high-dimensional data.


stl2vec: Semantic and Interpretable Vector Representation of Temporal Logic

Saveri, Gaia, Nenzi, Laura, Bortolussi, Luca, Křetínský, Jan

arXiv.org Artificial Intelligence

For algorithms is a longstanding challenge in Artificial Intelligence. Despite example in STL one can state properties like "the temperature of the the recognized importance of this task, a notable gap exists due room will reach 25 degrees within the next 10 minutes and will stay to the discreteness of symbolic representations and the continuous above 22 degrees for the next hour". In this area, one is typically interested nature of machine-learning computations. One of the desired bridges in understanding or verifying which properties the system between these two worlds would be to define semantically grounded under analysis is compliant to (or more precisely, in the probability vector representation (feature embedding) of logic formulae, thus enabling of observing behaviour satisfying the property). Such analysis is often to perform continuous learning and optimization in the semantic tackled by formal methods, via algorithms belonging to the world space of formulae. We tackle this goal for knowledge expressed in of quantitative model checking [4]. Signal Temporal Logic (STL) and devise a method to compute continuous In this work, we address the challenge of incorporating knowledge embeddings of formulae with several desirable properties: the in the form of temporal logic formulae inside data-driven embedding (i) is finite-dimensional, (ii) faithfully reflects the semantics learning algorithms. The key step is to devise a finite-dimensional of the formulae, (iii) does not require any learning but instead is embedding (feature mapping) of logical formulae into continuous defined from basic principles, (iv) is interpretable.


A Denoising Diffusion Model for Fluid Field Prediction

Yang, Gefan, Sommer, Stefan

arXiv.org Artificial Intelligence

We propose a novel denoising diffusion generative model for predicting nonlinear fluid fields named FluidDiff. By performing a diffusion process, the model is able to learn a complex representation of the high-dimensional dynamic system, and then Langevin sampling is used to generate predictions for the flow state under specified initial conditions. The model is trained with finite, discrete fluid simulation data. We demonstrate that our model has the capacity to model the distribution of simulated training data and that it gives accurate predictions on the test data. Without encoded prior knowledge of the underlying physical system, it shares competitive performance with other deep learning models for fluid prediction, which is promising for investigation on new computational fluid dynamics methods.


Learning One Abstract Bit at a Time Through Self-Invented Experiments Encoded as Neural Networks

Herrmann, Vincent, Kirsch, Louis, Schmidhuber, Jürgen

arXiv.org Artificial Intelligence

There are two important things in science: (A) Finding answers to given questions, and (B) Coming up with good questions. Our artificial scientists not only learn to answer given questions, but also continually invent new questions, by proposing hypotheses to be verified or falsified through potentially complex and time-consuming experiments, including thought experiments akin to those of mathematicians. While an artificial scientist expands its knowledge, it remains biased towards the simplest, least costly experiments that still have surprising outcomes, until they become boring. We present an empirical analysis of the automatic generation of interesting experiments. In the first setting, we investigate self-invented experiments in a reinforcement-providing environment and show that they lead to effective exploration. In the second setting, pure thought experiments are implemented as the weights of recurrent neural networks generated by a neural experiment generator. Initially interesting thought experiments may become boring over time.


The Terminating-Knockoff Filter: Fast High-Dimensional Variable Selection with False Discovery Rate Control

Machkour, Jasin, Muma, Michael, Palomar, Daniel P.

arXiv.org Machine Learning

We propose the Terminating-Knockoff (T-Knock) filter, a fast variable selection method for high-dimensional data. The T-Knock filter controls a user-defined target false discovery rate (FDR) while maximizing the number of selected true positives. This is achieved by fusing the solutions of multiple early terminated random experiments. The experiments are conducted on a combination of the original data and multiple sets of randomly generated knockoff variables. A finite sample proof based on martingale theory for the FDR control property is provided. Numerical simulations show that the FDR is controlled at the target level while allowing for a high power. We prove under mild conditions that the knockoffs can be sampled from any univariate distribution. The computational complexity of the proposed method is derived and it is demonstrated via numerical simulations that the sequential computation time is multiple orders of magnitude lower than that of the strongest benchmark methods in sparse high-dimensional settings. The T-Knock filter outperforms state-of-the-art methods for FDR control on a simulated genome-wide association study (GWAS), while its computation time is more than two orders of magnitude lower than that of the strongest benchmark methods.


Human biases cause problems for machines trying to learn chemistry

#artificialintelligence

They found that models trained on a small randomised sample of reactions outperformed those trained on larger human-selected datasets. The results show the importance of including experimental results that people might think are unimportant when it comes to developing computer programs for chemists. Machine learning models are a valuable tool in chemical synthesis, but they're trained on data from the literature where positive results are favoured, whereas the dark reactions – the experiments that were tried but didn't work – are usually left out. 'Including these failures is essential for generating predictive machine learning models,' says Joshua Schrier of Fordham University, US, who was part of a team that studied hydrothermal syntheses of amine-templated metal oxides and found that biases were introduced into the literature by people's choices of the reaction parameters. 'We considered extra dark reactions – a class of reactions that humans don't even attempt, not because of scientific or practical reasons, but simply because it's humans who make the decisions,' Schrier says.


How I Learned to Stop Worrying and Love Uncertainty

#artificialintelligence

Since their early days, humans have had an important, often antagonistic relationship with uncertainty; we try to kill it everywhere we find it. Without an explanation for many natural phenomena, humans invented gods to explain them, and without certainty of the future, they consulted oracles. It was precisely the oracle's role to reduce uncertainty for their fellow humans, predicting their future and giving counsel according to their gods' will, and even though their accuracy left much to be desired, they were believed, for any measure of certainty is better than none. As society grew sophisticated, oracles were (not completely) displaced by empiric thought, which proved much more successful at prediction and counsel. Empiricism itself evolved into the collection of techniques we call the scientific method, which has proven to be much more effective at reducing uncertainty, and is modern society's most trustworthy way of producing predictions.